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Second order approximate ancillaries have evolved as the primary ingredient for recent likeli- 
hood development in statistical inference. This uses quantile functions rather than the equivalent 
distribution functions, and the intrinsic ancillary contour is given explicitly as the plug-in esti- 
mate of the vector quantile function. The derivation uses a Taylor expansion of the full quantile 
function, and the linear term gives a tangent to the observed ancillary contour. For the scalar 
parameter case, there is a vector field that integrates to give the ancillary contours, but for the 
vector case, there are multiple vector fields and the Frobenius conditions for mutual consistency 
may not hold. We demonstrate, however, that the conditions hold in a restricted way and that 
this verifies the second order ancillary contours in moderate deviations. The methodology can 
generate an appropriate exact ancillary when such exists or an approximate ancillary for the 
numerical or Monte Carlo calculation of p-values and confidence quantiles. Examples are given, 
including nonlinear regression and several enigmatic examples from the literature. 
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1. Introduction 

Ancillaries are loved or hated, accepted or rejected, but typically ignored. Recent ap- 
proximate ancillary methods (e.g., [28]) give a decomposition of the sample space rather 
than providing statistics on the sample space (e.g., [7, 26]). As a result, continuity gives 
the contour along which the variable directly measures the parameter and then gives the 
subcontour that provides measurement of a parameter of interest. This, in turn, enables 
the high accuracy of cumulant generating function approximations [2, 9] to extend to 
cover a wide generality of statistical models. 
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Ancillaries initially arose (see [10]) to examine the accuracy of the maximum like- 
lihood estimate, then (see [11]) to calibrate the loss of information in the use of the 
maximum likelihood estimate and then (see [12]) to develop a key instance involving 
the configuration statistic. The configuration of a sample arises naturally in the con- 
text of sampling a location-scale model, where a standardized coordinate z — (y — fi)/a 
has a fixed and known error distribution g{z): the ith coordinate of the response thus 
has f(yi;fi,a) = cr~ 1 g{(yi — /i)/<r}. The configuration a(y) of the sample is the plug-in 
estimate of the standardized residual, 



where (//, a) is the maximum likelihood value for (//, a) or is some location-scale equiv- 
alent. Clearly, the distribution of z is free of \i and a as the substitution = fj, + crZi 
in (1.1) leads to the cancellation of dependence on fi and a. This supports a common 
definition for an ancillary statistic a(y), that it has a parameter-free distribution; other 
conditions are often added to seek sensible results. 

More generally, the observed value of an ancillary identifies a sample space contour 
along which parameter change modifies the model, thus yielding the conditional model 
on the observed contour as the appropriate model for the data. The ancillary method is 
to use directly this conditional model identified by the data. 

One approach to statistical inference is to use only the observed likelihood function 
L°(9) = L(6;y°) from the model f(y;9) with observed data y°. Inference can then be 
based on some simple characteristic of that likelihood. Alternatively, a weight function 
w(8) can be applied and the composite w(8)L(8) treated as a distribution describing the 
unknown 6; this leads to a rich methodology for exploring data, usually, but unfortu- 
nately, promoted solely within the Bayesian framework. 

A more incisive approach derives from an enriched model which is often available and 
appropriate. While the commonly cited model is just a set of probability distributions on 
the sample space, an enriched model can specifically include continuity of the model den- 
sity function and continuity of coordinate distribution functions. An approach that builds 
on these enrichments can then, for example, examine the observed data y° in relation to 
other data points that have a similar shape of likelihood and are thus comparable, and 
can do even more. For the location-scale model, such points are identified by the config- 
uration statistic; then, accordingly, the model for inference would be f{y | a(y) = a°;9}, 
where a(y) is the configuration ancillary. 

Exact ancillaries as just described are rather rare and seem limited to location- 
type models and simple variants. However, extensions that use approximate ancillar- 
ies (e.g., [18, 22]) have recently been broadly fruitful, providing approximation in an 
asymptotic sense. Technical issues can arise with approximate values for an increas- 
ing number of coordinates, but these can be managed by using ancillary contours 
rather than statistics; thus, for a circle, we use explicitly a contour A = {(x,y) = 
{a 1 / 2 cost, a 1 / 2 sini): t in [0, 2n)} rather than using implicitly a statistic x 2 + y 2 = a. 

We now assume independent coordinate distribution functions that are continuously 
differentiable with respect to the variable and the parameter; extensions will be discussed 




(1.1) 
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separately. Then, rather than working directly with a coordinate distribution function 
Ui = Fi(jji; 6) , we will use the inverse, the quantile function yi = yi {ui\9) which presents a 
data value yi in terms of a corresponding p- value Ui. For additional advantage, we could 
use a scoring variable x in place of the p- value, for example, x = or x = 9q), 

where $(•) is the standard Normal distribution function. We can then write y = y(x; 9), 
where a coordinate j/i is presented in terms of the corresponding scoring variable Xi . 

For the full response variable, let y = y(x;9) = {y\(xi\ 9), . . . , y n (x n \ 9)}' be the quan- 
tile vector expressing y in terms of the reference or scoring variable x with its given 
distribution: the quantile vector records how parameter change affects the response vari- 
able and its distribution, as prescribed by the continuity of the coordinate distribution 
functions. 

For an observed data point y°, a convenient reference value x or the fitted p- value 
vector is obtained by solving the equation y = y(x;9°) for x, where 9° is the observed 
maximum likelihood value; for this, we assume regularity and asymptotic properties for 
the statistical model. The contour of the second order ancillary through the observed 
data point as developed in this paper is then given as the trajectory of the reference 
value, 

A = {y(x°]t): t in IF}, (1.2) 

to second order under parameter change, where p here is the dimension of the parameter. 
A sample space point on this contour has, to second order, the same estimated p- value 
vector as the observed data point and special properties for the contours are available to 
second order. 

The choice of the reference variable with given data has no effect on the contour: the 
reference variable could be Uniform, as with the p-value; or, it could be the response 
distribution itself for some choice of the parameter, say 9q. 

For the location-scale example mentioned earlier, we have the coordinate quantile 
function y.i = /i + crz.;, where Zi has the distribution g{z). The vector quantile function is 

y(z;n,a) =fil + az, (1.3) 

where 1 = (1,...,1)' is the 'one vector.' With the data point y°, we then have the fitted 
z Q = (y° — /i l)/i7 . The observed ancillary contour to second order is then obtained from 
(1.2) by substituting z° in the quantile (1.3): 

A = {y(z°; t)} = {ml + sz°; (m, s) in K x R+} = £+(1; z°) (1.4) 

with positive coefficient for the second vector. This is the familiar exact ancillary contour 
a(y) = a from (1.1). 

An advantage of the vector quantile function in the context of the enriched model men- 
tioned above is that it allows us to examine how parameter change modifies the distribu- 
tion and thus how it moves data points as a direct expression of the explicit continuity. 
In this sense, we define the velocity vector or vectors as v(x; 9) = (d / d6)y(x; 8) = dy/d9. 
In the scalar 9 case, this is a vector recording the direction of movement of a point y 
under 9 change; in the vector 9 case, it is a 1 x p array of such vectors in K n , V(x; 9) = 
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{vi(xi; 0), . . . , Vp(x p \ #)}, recording the separate effects from the parameter coordinates 
#i, ... , p . For the location-scale example, the velocity array is V(z; /i, a) = (1, z), which 
can be viewed as a 1 x 2 array of vectors in K" . 

The ancillary contour can then be presented using a Taylor series about y° with coef- 
ficients given by the velocity and acceleration V and W. For the location-scale example, 
the related acceleration vectors are equal to zero. 

For more insight, consider the general scalar 8 case and the velocity vector v(x; 8°). For 
a typical coordinate, this gives the change dy = v(x; 8°) dd in the variable as produced 
by a small change dd at 9°. A re-expression of the coordinate variable can make these 
increments equal and produce a location model; the product of these location models 
is a full location model g(yi — 8, . . . ,y n — 8) that precisely agrees with the initial model 
to first derivative at 8 = 8° (see [1, 20]). This location model then, in turn, determines 
a full location ancillary with configuration a(y) = (yi — y, . . . , y n — y). For the original 
model, this configuration statistic has first-derivative ancillarity at 8 — 8° and is thus a 
first order approximate ancillary; the tangent to the contour at the data point is just the 
vector v(x°;8°). Also this contour can be modified to give second order ancillarity. 

In a somewhat different way, the velocity vector v(y°;9) at the data point y° gives 
information as to how data change at y° relates to parameter change at various 8 values 
of interest. This allows us to examine how a sample space direction at the data point 
relates to estimated p- value and local likelihood function shape at various 8 values; this, 
in turn, leads to quite general default priors for Baycsian analysis (see [21]). 

In the presence of a cumulant generating function, the saddle-point method has pro- 
duced highly accurate third order approximations for density functions (see [9]) and for 
distribution functions (sec [25]). Such approximations are available in the presence of 
exact ancillaries [2] and extend widely in the presence of approximate ancillaries (see 
[18]). For third order accuracy, only second order approximate ancillaries are needed, 
and for such ancillaries, only the tangents to the ancillary contour at the data point are 
needed (see [18, 19]). With this as our imperative, we develop the second order ancillary 
for statistical inference. 

Tangent vectors to an ancillary at a data point give information as mentioned above 
concerning a location model approximation at the data point. For a scalar parameter, 
these provide a vector field and integrate quite generally to give a unique approximate 
ancillary to second order accuracy. The resulting conditional model then provides defini- 
tive p- values by available theory; see, for example, [22]. For a vector parameter, however, 
the multiple vector fields may not satisfy the Frobenius conditions for integrability and 
thus may not define a function. 

Under mild conditions, however, we show that such tangent vectors do generate a sur- 
face to second order without the Frobenius conditions holding. We show this in several 
steps. First, we obtain the coordinate quantile functions y,; = yi(xf,6). Second, we Tay- 
lor series expand the full vector quantile y = (yi, . . . , y n ) in terms of the full reference 
variable x = (x%, . . . , x n ) and the parameter 8 = (6q, . . . , 8 p ) about data-based values, ap- 
propriately re-expressing coordinates and working to second order. Third, we show that 
this generates a partition with second order ancillary properties and the usual tangent 
vectors. The seeming need for the full Frobenius conditions is bypassed by finding that 
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two integration routes need not converge to each other, but do remain on the same 
contour, calculating, of course, to second order. 

This construction of an approximate ancillary is illustrated in Section 2 using the fa- 
miliar example, the Normal-on-the circle from [13]; see also [3, 8, 16, 20]. The example, 
of course, does have an exact ancillary and the present procedure gives an approxima- 
tion to that ancillary. In Section 3, we consider various examples that have exact and 
approximate ancillaries, and then in Sections 4 and 5, we present the supporting the- 
ory. In particular, in Section 4, we develop notation for a p-dimensional contour in R n , 
A = {y(x n ;t): t in R p }, and use velocity and acceleration vectors to present a Taylor 
series with respect to t. Then, in Section 5, we consider a regular statistical model with 
asymptotic properties and use the notation from Section 4 to develop the second order 
ancillary contour through an observed data point y . The re-cxprcssion of individual 
coordinates, both of the variable and the parameter, plays an essential role in the de- 
velopment; an asymptotic analysis is used to establish the second order approximate 
ancillarity. Section 6 contains some discussion. 

2. Normal-on-the-circle 

We illustrate the second order approximate ancillary with a simple nonlinear regression 
model, the Normal-on-the-circle example (see [13]). The model has a well-known exact 
ancillary. Let y = (2/1,2/2)' be Normal on the plane with mean (pcosO, psin9)' and variance 
matrix I/n with p known. The mean is on a circle of fixed radius p and the distribution 
has rotationally symmetric error with variances n _1 , suggesting an antecedent sample 
size n for an asymptotic approach. The full n-dimensional case is examined as Example 2 
in Section 3 and the present case derives by routine conditioning. 

The distribution is a unit probability mass centered at (pcos9, psind)' on the circle 
with radius p. If rotations about the origin are applied to (2/1,2/2)', then the probability 
mass rotates about the origin, the mean moves on the circle with radius p and an element 
of probability at a distance r from the origin moves on a circle of radius r. The fact that 
the rotations move probability along circles but not between circles of course implies 
that probability on any circle about the origin remains constant: probability flows on the 
ancillary contours. Accordingly, we have that the radial distance r = (y 2 + y 2 ) 1 ^ 2 has a 
fixed 6-free distribution and is thus ancillary. 

The statistic r(y) is the Fisher exact ancillary for this problem and Fisher recom- 
mended that inference be based on the conditional model, given the observed ancil- 
lary contour. This conditional approach has a long but uneven history; [17] provides an 
overview and [23] offer links with asymptotic theory. We develop the approximate second 
order ancillary and examine how it relates to the Fisher exact ancillary. 

The model for the Normal-on-thc-circlc has independent coordinates, so wc can invert 
the coordinate distribution functions and obtain the vector quantile function, 
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Figure 1. The regression surface S is a circle of radius R; the local contour of the approximate 
ancillary A is a circle segment of S moved from y° to y°; the exact ancillary contour is a circle 
segment of radius r° through the data point y°. 



where the xt = $ _1 (ui)/?i 1 / 2 are independent normal variables with means and vari- 
ances n -1 , and <& is the standard Normal distribution function. We now examine the 
second order ancillary contour A given by (1.2). 

Let y° = (yi , y® ) ' = (r° cos a , r° sin a ) be the observed data point where r° , a are the 
corresponding polar coordinates; see Figure 1. For this simple nonlinear normal regression 
model, 8° = a is the angular direction of the data point. The fitted reference value x° 
is the solution of the equation y° = y(x;9°) = p(cosa°,sina )' + (xi,X2), giving x° = 
(xijX®)' = y° — jo(cosa°, sin a )' = y° — y°, where y° = p(cosa°,sina )' is the fitted value, 
which is the projection of the data point y° onto the circle. The observed ancillary contour 
is then 

■ o f / cos6»\ .o-O/) ol o -o , f /cos(a°+i)\ 

A \p\ ■ a + v y ■ ® near a \ = v v + {p\ ■ ■ o , A ) ■ f ncar 



sine J 1 y y • " " J » » ' ^^sin(a°+t) 

Figure 1 shows that A — {y(x°;t): t near a } is a translation, as shown by the arrow of 
a segment S of the solution contour, from the fitted point y° to the data point y°. 

The second order ancillary segment at y° does not lie on the exact ancillary sur- 
face r(yi,2/2) = r° . The tangent vector at the data point y° is v = (dy/dt)\ t=a o = 
(— psina , pcosa )', which is the same as the tangent vector for the exact ancillary and 
which agrees with the usual tangent vector v (see [22]). However, the acceleration vec- 
tor is w = (d 2 / dt 2 )y\ t=a o = (— psina , — pcosa )' , which differs slightly from that for the 
exact ancillary: the approximation has radius of curvature p, as opposed to r° for the 
exact, but the difference in moderate deviations about y° can be seen to be small and is 
second order. 

The second order ancillary contour through y° can also be expressed in a Taylor series 
as A° = {y° +tv + wt 2 /2: t near 0}; here, the acceleration vector w is orthogonal to the 
velocity vector v. Similar results hold in wide generality when y has dimension n and 
9 has dimension p; further examples are discussed in the next section and the general 
development follows in Sections 4 and 5. 
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3. Some examples 

Example 1 (Nonlinear regression, <jq known). Consider a nonlinear regression 
model y = n{&) + x in R", where the error x is Normal(0; CTq/) and the regression or 
solution surface S = {??(#)} is smooth with parameter 9 of dimension, say, r. For given 
data point y°, let 9° be the maximum likelihood value. The fitted value is then y° = rj(9°) 
and the fitted reference value is x° = y° — r](9°) =y° — y°. The model as presented is al- 
ready in quantile form; accordingly, V = (dn/d6)\g ,W = (d 2 r]/d9 2 )\g are the observed 
velocity and acceleration arrays, respectively, and the approximate ancillary contour at 

the data point y° is A = {y° + Vt + t'Wt/2 H : t in R r }, which is just a y° - y° 

translation of the solution surface S = {y° + Vt + t'Wt/2 + • • • : t in R r }. For this, we 
use matrix multiplication to linearly combine the elements in the arrays V and W. 

Example 2 (Nonlinear regression, circle case). As a special case, consider the 
regression model where the solution surface S = {i](9)} is a circle of radius p about the 
origin; this is the full-dimension version of the example in Section 2. For notation, let C = 
(ci , . . . , c n ) be an orthonormal basis with vectors c\ , ci defining the plane that includes S. 
Then y = C'y provides rotated coordinates and fj(9) = C'r){9) = (pcos9, ps'm9, 0, . . . , 0) 
gives the solution surface in the new coordinates. 

There is an exact ancillary given by r — (yf -\-y^) 1 l 2 and (1/3, . . . , y n ); the corresponding 
ancillary contour through y° is a circle of radius r° through the data point y and lying 
in the plane 2/3 = y® , . . . , y n = y® . The approximate ancillary contour is a segment of a 
circle of radius p through the data point y° and lying in the same plane. This directly 
agrees with the simple Normal-on-the-circle example of Section 2. 

For the nonlinear regression model, Severini ([29], page 216) proposes an approximate 
ancillary by using the obvious pivot y — rj(9) with the plug-in maximum likelihood value 
9 = 9; we show that this gives a statistic A(y) = y — i](9) that can be misleading. In the 
rotated coordinates, the statistic A{y) becomes 

A(y) = (rcos#,rsin0,y 3 ,...,y„)' - (pcosO, psinO, 0, . . . , 0)' 
= {(r - p) cos#, (r - p) sin (9, y 3 , . . .,y n }', 

which has observed value A = {(r° — p) cos 9°, (r° — p) sin 9°, y®, . . . ,2/2}'. 

If we now set the proposed ancillary equal to its observed value, A = A , we obtain 2/3 = 
i/3, ■ • ■ , y n = jjn and also obtain r = r° and 9 = 9°. Together, these say that y = y°, and 
thus that the proposed approximate ancillary is exactly equivalent to the original response 
variable, which is clearly not ancillary. Severini does note ". . . it does not necessarily follow 
that a is a second-order ancillary statistic since the dimension of a increases with n." 
The consequences of using the plug-in 9 in the pivot are somewhat more serious: the 
plug-in pivotal approach for this example does not give an approximate ancillary. 

Example 3 (Nonlinear regression, a unknown). Consider a nonlinear regression 
model y = i](9) + az in R™, where the error z is Normal(0;7) and the solution surface 
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S = {i](9)} is smooth with surface dimension r (see [24]). Let y° be the observed data 
point and (9°, a ) be the corresponding maximum likelihood value. We then have the 
fitted regression y°, the fitted residual x° = y° — y°, and the fitted reference value z° = 
x°/a° which is just the standardized residual. 

Simple calculation gives the velocity and acceleration arrays 

using V and W from Example 1. The approximate ancillary contour at the data point 
y° is then 

A = {y° + VT + t'Wt/2 + ■■■ + sz°: t in W, s in R+} 
= {q(t) + sz°: tin W , s in R+} 
= A° + £ + (z°), 

where A is as in Example 1. This is the solution surface from Example 1, translated 
from y° to y° and then positively radiated in the z° direction. 

Example 4 (The transformation model). The transformation model (see, e.g., [14]) 
provides a paradigm for exact ancillary conditioning. A typical continuous transformation 
model for a variable y — 9z has parameter 9 in a smooth transformation group G that 
operates on an n-dimensional sample space for y; for illustration, we assume here that the 
group acts coordinate by coordinate. The natural quantile function for the «th coordinate 
is iji = 9zi , where Zi is a coordinate reference variable with a fixed distribution; the linear 
regression model with known and unknown error scaling are simple examples. With 
observed data point y°, let 9° be the maximum likelihood value and z° the corresponding 
reference value satisfying y° = 9°z°. The second order approximate ancillary is then given 
as {9z }, which is just the usual transformation model orbit Gz°. If the group does not 
apply separately to independent coordinates, then the present quantile approach may 
not be immediately applicable; this raises issues for the construction of the trajectories 
and also for the construction of default priors (see, e.g., [4]). Some discussion of this in 
connection with curved parameters will be reported separately. A modification achieved 
by adding structure to the transformation model is given by the structural model [14]. 
This takes the reference distribution for z as the primary probability space for the model 
and examines what events on that space are identifiable from an observed response; we 
do not address here this alternative modelling approach. 

Example 5 (The inverted Cauchy). Consider a location-scale model centered at p, 
and scaled by a with error given by the standard Cauchy; this gives the statistical model 

f(y; m, o") = — 77— i— — n 2/ 2i 
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Figure 2. (a) The location-scale Cauchy model for the inverted y\ = 1/yi, yi = l/j/2 has an an- 
cillary contour given by the shaded area in (b). When interpreted back for the original (y±, y^) the 
connected ancillary contour becomes three unconnected regions, shown in (a) . A line yi = y\ + 1 
on the contour in (b) is mapped back to three curved segments in (a) and numbered points 
in sequence on the line are mapped back to the numbered points on the unconnected ancillary 
contour. 

on the real line. For the sampling version, this location-scale model is an example of 
the transformation model discussed in the preceding Example 4 and the long-accepted 
ancillary contour is the half-plane (1.4). 

McCullagh [27] uses linear fractional transformation results that show that the inver- 
sion y = 1/y takes the Cauchy (/i, a) model for y into a Cauchy (/i, a) model for y, where 
/i = /-t/(A' 2 + (j2 )^ = °"/(A t2 + cj2 )- He then notes that the usual location-scale ancillary 
for the derived model does not map back to give the usual location-scale ancillary on 
the initial space and would thus typically give different inference results for the parame- 
ters; he indicates "not that conditioning is a bad idea, but that the usual mathematical 
formulation is in some respects ad hoc and not completely satisfactory." 

We illustrate this for n = 2 in Figure 2. For a data point in the upper- left portion of 
the plane in part (b) for the inverted Cauchy, the observed ancillary contour is shown 
as a shaded area; it is a half-plane subtended by £(1). When this contour is mapped 
back to the initial plane in part (a), the contour becomes three disconnected segments 
with lightly shaded edges indicating the boundaries; in particular, the line with marks 
1, 2, 3, 4, 5, 6 becomes three distinct curves again with corresponding marks 1, 2, 3, 4, 
5, 6, but two points (0, 1), (1,0) on the line have no back images. Indeed, the same type 
of singularity, where a point with a zero coordinate cannot be mapped back, happens 
for any sample size n. Thus the proposed sample space is not one-to-one continuously 
equivalent to the given sample space: points are left out and points are created. And the 
quantile function used on the proposed sample space for constructing the ancillary does 
not exist on the given sample space: indeed, it is not defined at points and is thus not 
continuous. 
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The Cauchy inversion about could equally be about an arbitrary point, say a, on the 
real line and would lead to a corresponding ancillary. We would thus have a wealth of 
competing ancillaries and a corresponding wealth of inference procedures, and all would 
have the same lack of one-to-one continuous equivalence to the initial sample space. 
While Fisher seems not to have explicitly specified continuity as a needed ingredient for 
typical ancillarity, it also seems unlikely that he would have envisaged ancillarity without 
continuity. If continuity is included in the prescription for developing the ancillary, then 
the proposed ancillary for the inverted Cauchy would not arise. 

Bayesian statistics involves full conditioning on the observed data and familiar frequen- 
tist inference avoids, perhaps even evades, conditioning. Ancillarity, however, represents 
an intermediate or partial conditioning and, as such, offers a partial bridging of the two 
extreme approaches to inference. 

4. An asymptotic statistic 

For the Normal-on-the-circle example, the exact ancillary contour was given as the ob- 
served contour of the radial distance r(yi,y2)- the contour is described implicitly. By 
contrast, the approximate ancillary was given as the trajectory of a point y(x ,t) un- 
der change of an index or mathematical parameter t: the contour is described explicitly. 
For the general context, the first approach has serious difficulties, as found even with 
nonlinear regression, and these difficulties arise with an approximate statistic taking an 
approximate value; see Example 2. Accordingly, we now turn to the second, the explicit 
approach, and develop the needed notation and expansions. 

Consider a smooth one-dimensional contour through some point yo- To describe such 
a contour in the implicit manner requires n — 1 complementary statistics. By contrast, 
for the explicit method, we write y = y(t), which maps a scalar t into the sample space 
WL n . More generally, for a p-dimensional contour, we have y = y(t) in R", where t has 
dimension p and the mapping is again into l n . 

For such a contour, we define the row array V(t) = (d/dt')y(t) = {v±(t), . . . , v p (t)} of 
tangent vectors, where the vector v a (t) = (d/dt a )y(t) gives the direction or gradient of 
y{t) with respect to change in a coordinate t a . We are interested in such a contour near 
a particular point yo =y(to); for convenience, we often choose yo to be the observed 
data point y° and the to to be centered so that to = 0. In particular, the array V = 
V(to) of tangent vectors at a particular data point yo will be of special interest. The 
vectors in V generate a tangent plane C(V) at the point yo and this plane provides a 
linear approximation to the contour. Differential geometry gives length properties of such 
vectors as the first fundamental form: 



this records the matrix of inner products for the vectors V as inherited from the inner 
product on R™. A change in the parameterization t = t(t) of the contour will give different 
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tangent vectors V, the same tangent plane C(V) and a different, but corresponding, first 
fundamental form. 

Now, consider the derivatives of the tangents V(t) at to- 

(wix ■■■ w lp s 

d 



t=t 



\W P 1 •■■ W p 



where w aa i = (d 2 /dt a dt a ')y(t)\ t= t is an acceleration or curvature vector relative to 
coordinates t a and t a > at to. We regard the array W as a p x p array of vectors in R". 
We could have used tensor notation, but the approach here has the advantage that we 
can write the second degree Taylor expansion of y(t) at to = as 

y(t)=y + Vt + t'Wt/2 + --., (4.1) 

which uses matrix multiplication for linearly combining the vectors in the arrays V 
and W. Some important characteristics of the quadratic term in (4.1) are obtained by 
orthogonalizing the elements of W to the tangent plane C(V), to give residuals 

W aa > ={I~ V{V'Vy V}uw = «W ~ PWaa'] 

this uses the regression analysis projection matrix P = V(V'V)~ 1 V . The full array W 
of such vectors w aa > is then written W = W — PW = W — VH, where H = (h aa <) is a 
p x p array of elements h aa > — (V'V)~ 1 V element h aa > is a p x 1 vector, which 

records the regression coefficients of w aa > on the vectors V. 

The array W of such orthogonalizcd curvature vectors w is the second fundamental 
form for the contour at the expansion point. Consider the Taylor expansion (4.1) and 
substitute W = W + VH: 

y(t) = 2/o + Vt + t'(W + VH)t/2 + • • • 
= 2/o + V(t + t'Ht/2) + t'Wt/2 + ■■■, 

where we note that t and t' are being applied to the p x p arrays H and W by matrix 
multiplication, but the elements are p x 1 vectors for H and n x 1 vectors for W, and 
these are being combined linearly. We can then write y(t) = 2/0 + Vt + tWt'/2 + ■ ■ ■ and 
thus have the contour expressed in terms of orthogonal curvature vectors w with the 
rcparamcterization t = t + t'Ht/2 + When we use this in the asymptotic setting, 
we will have standardized coordinates and the reparameterization will take the form 
t = t + t'Ht/2n 1 / 2 + ••-. 



5. Verifying second order ancillarity 

We have used the Normal-on-the-circle example to illustrate the proposed second order 
ancillary contour {y(x°;t)}. Now, generally, let f(y;0) be a statistical model with reg- 
ularity and asymptotic properties as the data dimension n increases: we assume that 
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the vector quantile y(x; 9) has independent scalar coordinates and is smooth in both the 
reference variable x and the parameter 9; more general conditions will be considered 
subsequently. For the verification, we use a Taylor expansion of the quantile function in 
terms of both x and 9, and work from theory developed in [5] and [1]. The first steps 
involve the re-expression of individual coordinates of y, x, and 6, and show that the pro- 
posed contours establish a partition on the sample space; the subsequent steps establish 
the ancillarity of the contours. 

(la) Standardizing the coordinates. Consider the statistical model in moderate devia- 
tions about (y°, 9°) to order 0(n _1 ). For this, we work with coordinate departures in units 
scaled by n -1 / 2 . Thus, for the ith coordinate, we write yi = y® + yi/n 1 / 2 , Xi = xf + Xi/n 1 / 2 
and 9 a = 6^ + 6 a /ii}^ 2 ; and for a modified ith quantile coordinate yi = yiixi, 9), we Tay- 
lor expand to the second order, omit the subscripts and tildes for temporary clarity, and 
obtain y = x + V6 + (ax 2 + 2xB9 + 9'W9)/2n 1 / 2 , where V is the 1 x p gradient of y 
with respect to 9, B is the 1 x p cross Hessian with respect to x and 9, W is the p x p 
Hessian with respect to 9 and vector-matrix multiplication is used for combining 9 with 
the arrays. 

(lb) Re-expressing coordinates for a nicer expansion. We next rc-express an x coor- 
dinate, writing x = x + ax 2 /2n 1//2 , and then again omit the tildes to obtain the simpler 
expansion 

y = x + V9 + (2xB9 + 6'W9)/2n 1/2 + • • • , (5.1) 

to order 0(n _1 ) for the modified y, x and 9, now in bounded regions about 0. 

(lc) Full response vector expansion. For the vector response y = (yi ,...,?/„) in quantile 
form, we can compound the preceding coordinate expansions and write y = x + V9 + (2x : 
B9 + O'WO) /2n x / 2 + • • • , where y and x are now vectors in R™, V = (vi, . . . ,v p ) = (v a ) 
and B = (bi, ...,b p ) = (b a ) are 1 x p arrays of vectors in R™, W = (w aa i) is a p x p array 
of vectors in K" and x:B is a 1 x p array of vectors x:b, where the ith element of the 
vector x:b is the product Xibi of the ith elements of the vectors x and b. 

(Id) Eliminate the cross Hessian: scalar parameter case. The form of a Taylor series 
depends heavily on how the function and the component variables are expressed. For a 
particular coordinate of (5.1) in (lb), if we re-express the coordinate y = y + cy 2 /2n 1 ^ 2 
in terms of a modified y, substitute it in (5.1) and then, for notational ease, omit the 
tildes, we obtain y J r c(x + v9) 2 /2n 1 / 2 = x + v9 + (2xb9 + 9 2 w) /2n 1 ^ 2 . To simplify this, we 
take the x 2 term over to the right-hand side and combine it with x to give a re-expressed 
x, take the 9x term over to the right-hand side and choose c so that cv = b and, finally, 
combine the 9 2 terms giving a new w. We then obtain y(x; 9) = x + vO + 9 2 w/2n 1 ^ 2 with 
the cross Hessian removed; for this, if v = 0, we ignore the coordinate as being ineffective 

for 9. For the full response accordingly, we then have y{x; 9) = x + v9 + w9 2 /2n 1 ^ 2 H 

to the second order in terms of re-expressed coordinates x and y. The trajectory of a 
point x is A(x) = {y(x; t)} = {x + vt + wt 2 /2n x / 2 + • • •} to the second order as t varies. 

(le) Scalar case: trajectories form a partition. In the standardized coordinates, the 
initial data point is y° = with corresponding maximum likelihood value 9° = 0; the 
corresponding trajectory is A(0) = {vt + wt 2 /2n 1 ^ 2 + •••}. For a general reference value 
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x, but with 9(x) = 0, the trajectory is A(x) = {x + vt + wt 2 /2n 1 / 2 H } = x + A(0). The 

sets with 9(x) = are all translates of A(0) and thus form a partition. 

Consider an initial point xq with maximum likelihood value 6(xq) = and let yi ~ xq + 

vt\ + wt 2 /2n 1 / 2 H be a point in the set A(xq) — xq + A(0). We calculate the trajectory 

A(i/i) of y\ and show that it lies on A(xq)\ the partition property then follows and the 
related Jacobian effect is constant. From the quantile function y = x + v9 + w9 2 /2n 1 / 2 , we 
see that the y distribution is a 0-based translation of the reference distribution described 
by x. Thus the likelihood at y\ is l{y\ — v6 — w9 2 /27J 1 / 2 ), in terms of the log density 
l(x) near xq. It follows that y\ = xq + vt\ + wt 2 /2n 1 ^ 2 has maximum likelihood value 

Now, for the trajectory about y\ , we calculate derivatives 

d V , a , i/2 d2 y / 1/2 

— =v + w6/n I , Mp= w l n . 

which, at the point y\ = vt\ + wt 2 /2n 1 ^ 2 with 9 = 9(yi), gives 

Viy^^v + wh/n 1 ' 2 , W(y 1 ) = w/n 1 / 2 , 

to order 0(n _1 ). We thus obtain the trajectory of the point y\\ 

A(y{) = {x + vh + wt\/2n 1/2 + (v + wt x ln ll2 )t + wt 2 /2n 1/2 } 
= {x + vT + wT 2 /2n 1/2 } 

under variation in t. However, with T = t\ + 1, we have just an arbitrary point on the 
initial trajectory. Thus the mapping y — >• A(y) is well defined and the trajectories gen- 
erate a partition, to second order in moderate derivations in R™. In the standardized 
coordinates, the Jacobian effect is constant. 

(If) Vector case: trajectories form a partition. For the vector parameter case, we again 
use standardized coordinates and choose a parameterization that gives orthogonal curva- 
ture vectors w at the observed data point y° . We then examine scalar parameter change 
on some line through 9(y°). For this, the results above give a trajectory and any point on 
it reproduces the trajectory under that scalar parameter. Orthogonality ensures that the 
vector maximum likelihood value is on the same line just considered. These trajectories 
are, of course, part of the surface defined by {Vt + t'Wt/2n 1 / 2 } . We then use the parti- 
tion property of the individual trajectories as these apply perpendicular to the surface; 
the surfaces are thus part of a partition. We can then write the trajectory of a point x 
as a set 

A(x) = {x + Vt + t'Wt/2n 1/2 + ■■■: t} = x + A(0) (5.2) 

in a partition to the second order in moderate deviations. 

(2a) Observed information standardization. With moderate regularity, and following 
[18] and [23], we have a limiting Normal distribution conditionally on y° + C(V). We 
then rescale the parameter at 9° to give identity observed information and thus an 
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identity variance matrix for the Normal distribution to second order. We also have a 
limiting Normal distribution conditionally on y° + C(V, W); for this, we linearly modify 
the vectors in W by rescaling and regressing on C(V) to give distributional orthogonality 
to 8 and identity conditional variance matrix to second order. 

(2b) The trajectories are ancillary: first derivative parameter change. We saw in the 
preceding section that key local properties of a statistical model were summarized by the 
tangent vectors V and the curvature vectors W, and that the latter can, to advantage, 
be taken to be orthogonal to the tangent vectors. These vectors give local coordinates 
for the model and can be replaced by an appropriate subset if linear dependencies are 
present. 

First, consider the conditional model given the directions corresponding to the span 
y° + £{V, W}. From the ancillary expansion (5.2), we have that change of 9 to the second 
order moves points within the linear space y° + £{V,W}; accordingly, this conditioning 
is ancillary. Then, consider the further conditioning to an alleged ancillary contour, as 
described by (5.2). Also, let yo be a typical point having 9(yo) = 9° as the corresponding 
maximum likelihood value; yo is thus on the observed maximum likelihood contour. 

Now, consider a rotationally symmetric Normal distribution on the (x, y) plane with 
mean 9 on the x axis and let a = y + cx 2 /2 be linear in y with a quadratic adjustment 
with respect to x. Then a = a(x,y) is first-derivative ancillary at 8 = 0. For this, we 
assume, without loss of generality, that the standard deviations arc unity. The marginal 
density for a is then 



which is symmetric in 8; thus (d/d0)/(a; 0)|e=o = 0, showing that the distribution of a 
is first-derivative ancillary at 8 = or, more intuitively, that the amount of probability 
on a contour of a is first-derivative free of 8 at 8 = 0. Of course, for this, the y-spacing 
between contours of a is constant. 

Now, more generally, consider an asymptotic distribution for (x, y) that is first order 
rotationally symmetric Normal with mean 8 on the y = plane; this allows 0{n~ x / 2 ) 
cubic contributions. Also, consider an s-dimensional variable a = y + Q(x)/2n 1 / 2 which 
is a quadratic adjustment of y. The preceding argument extends to show that a(y) is 
first-derivative ancillary: the two 0(n -1 / 2 ) effects are zero and the combination is of the 



(2c) Trajectories are ancillary: parameter change in moderate deviations. Now, con- 
sider a statistical model f(y,9) with data point y° and assume regularity, asymptotics 
and smoothness of the quantile functions. We examine the parameter trajectory {y(x°; t)} 
in moderate deviations under change in t. From the preceding paragraph, we then have 
first-derivative ancillarity at 8 = 8 = 0. But this holds for each expansion in moderate 
deviations and we thus have ancillarity in moderate deviations. The key here has been 
to use the expansion form about the point that has 8 equal to the parameter value being 
examined. 




next order. 
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6. Discussion 

(i) On ancillarity. The Introduction gave a brief background on ancillary statistics 
and noted that an ancillary is typically viewed as a statistic with a parameter-free dis- 
tribution; for some recent discussion, see [17]. Much of the literature is concerned with 
difficulties that can arise using this third Fisher concept, third after sufficiency and likeli- 
hood: that maximizing power given size typically means not conditioning on an ancillary; 
that shorter on-average confidence intervals typically mean ignoring ancillary condition- 
ing; that techniques that are conditional on an ancillary are often inadmissible; and more. 
Some of the difficulty may hinge on whether there is merit in the various optimality crite- 
ria themselves. However, little in the literature seems focused on the continued evolution 
and development of this Fisher concept, that is, on what modifications or evolution can 
continue the exploration initiated in Fisher's original papers (see [10-12]). 

(ii) On simulations for the conditional model. The second order ancillary in moderate 
deviations has contours that form a partition, as shown in the preceding section. In the 
modified or re-expressed coordinates, the contours are in a location relationship and, 
correspondingly, the Jacobian effect needed for the conditional distribution is constant. 
However, in the original coordinates, the Jacobian effect would typically not be constant 
and its effect would be needed for simulations. If the parameter is scalar, then the effect 
is available to the second order through the divergence function of a vector field; for 
some discussion and examples, see [15]. For a vector parameter, generalizations can be 
implemented, but we do not pursue these here. 

(iii) Marginal or conditional. When sampling from a scalar distribution having variable 
y and moderate regularity, the familiar central limit theorem gives a limiting Normal 
distribution for the sample average y or sample sum YllJi- From a geometric view, we 
have probability in n-space and contours determined by y, contours that are planes 
perpendicular to the 1-vector. If we then collect the probability on a contour, plus or 
minus a differential, and deposit it, say, on the intersection of the contour with the span 
£(1) of the 1-vector, then we obtain a limiting Normal distribution on £(1), using y or 
^2 Vi for location on that line. 

A far less familiar Normal limit result applies in the same general context, but with 
a totally different geometric decomposition. Consider lines parallel to the 1-vector, the 
affine cosets of C(l). On these lines, plus or minus a differential, we then obtain a limiting 
Normal distribution for location say y or ^y^. In many ways, this conditional, rather 
than marginal, analysis is much stronger and more useful. The geometry, however, is 
different, with planes perpendicular to C(l) being replaced by points on lines parallel to 
£(1). 

This generalizes giving a limiting conditional Normal distribution on almost arbitrary 
smooth contours in a partition and it has wide application in recent likelihood inference 
theory. It also provides third order accuracy rather than the first order accuracy asso- 
ciated with the usual geometry. In a simple sense, planes are replaced by lines or by 
generalized contours and much stronger, though less familiar, results are obtained. For 
some background based on Taylor expansions of log-statistical models, see [5, 6] and [1]. 



Second order ancillary 



1223 



Acknowledgements 

This research was supported by the Natural Sciences and Engineering Research Council 
of Canada. The authors wish to express deep appreciation to the referee for very incisive 
comments. We also offer special thanks to Kexin Ji for many contributions and support 
with the manuscript and the diagrams. 

References 

[1] Andrews, D.F., Fraser, D.A.S. and Wong, A. (2005). Computation of distribution functions 
from likelihood information near observed data. J. Statist. Plann. Inference 134 180- 
193. MR2146092 

[2] BarndorfF-Nielsen, O.E. (1986). Inference on full or partial parameters based on the stan- 
dardized log likelihood ratio. Biometrika 73 307-322. MR0855891 

[3] BarndorfF-Nielsen, O.E. (1987). Discussion of "Parameter orthogonality and approximate 
conditional inference." J. R. Stat. Soc. Ser. B Stat. Methodol. 49 18-20. MR0893334 

[4] Berger, J.O. and Sun, D. (2008). Objective priors for the bivariate normal model. Ann. 
Statist. 36 963-982. MR2396821 

[5] Cakmak, S., Fraser, D.A.S. and Reid, N. (1994). Multivariate asymptotic model: Exponen- 
tial and location approximations. Util. Math. 46 21-31. MR1301292 

[6] Cheah, P.K., Fraser, D.A.S. and Reid, N. (1995). Adjustment to likelihood and densities: 
Calculating significance. J. Statist. Res. 29 1-13. MR1345317 

[7] Cox, DR. (1980). Local ancillarity. Biometrika 67 279-286. MR0581725 

[8] Cox, D.R. and Reid, N. (1987). Parameter orthogonality and approximate conditional in- 
ference. J. R. Stat. Soc. Ser. B Stat. Methodol. 49 1-39. MR0893334 

[9] Daniels, H.E. (1954). Saddle point approximations in statistics. Ann. Math. Statist. 25 
631-650. MR0066602 

[10] Fisher, R.A. (1925). Theory of statistical estimation. Proc. Camb. Phil. Soc. 22 700-725. 
[11] Fisher, R.A. (1934). Two new properties of mathematical likelihood. Proc. R. Soc. Lond. 
Ser. A 144 285-307. 

[12] Fisher, R.A. (1935). The logic of inductive inference. J. R. Stat. Soc. Ser. B Stat. Methodol. 
98 39-54. 

[13] Fisher, R.A. (1956). Statistical Methods and Scientific Inference. Edinburgh: Oliver & Boyd. 
[14] Fraser, D.A.S. (1979). Inference and Linear Models. New York: McGraw-Hill. MR0535612 
[15] Fraser, D.A.S. (1993). Directional tests and statistical frames. Statist. Papers 34 213-236. 
MR1241598 

[16] Fraser, D.A.S. (2003). Likelihood for component parameters. Biometrika 90 327-339. 
MR1986650 

[17] Fraser, D.A.S. (2004). Ancillaries and conditional inference, with discussion. Statist. Sci. 

19 333-369. MR2140544 
[18] Fraser, D.A.S. and Reid, N. (1995). Ancillaries and third order significance. Util. Math. 47 

33-53. MR1330888 

[19] Fraser, D.A.S. and Reid, N. (2001). Ancillary information for statistical inference. In Em- 
pirical Bayes and Likelihood Inference (S.E. Ahmed and N. Reid, eds.) 185-207. New 
York: Springer. MR1855565 

[20] Fraser, D.A.S. and Reid, N. (2002). Strong matching for frequentist and Bayesian inference. 
J. Statist. Plann. Inference 103 263-285. MR1896996 



1224 



A.M. Fraser, D.A.S. Fraser and A.-M. Staicu 



[21] Fraser, D.A.S. , Reid, N., Marras, E. and Yi, G.Y. (2010). Default priors for Bayes and 

frequentist inference. J. R. Stat. Soc. Ser. B Stat. Methodol. To appear. 
[22] Fraser, D.A.S., Reid, N. and Wu, J. (1999). A simple general formula for tail probabilities 

for Bayes and frequentist inference. Biometrika 86 249-264. MR1705367 
[23] Fraser, D.A.S. and Rousseau, J. (2008). Studentization and deriving accurate p- values. 

Biometrika 95 1-16. MR2409711 
[24] Fraser, D.A.S., Wong, A. and Wu, J. (1999). Regression analysis, nonlinear or nonnormal: 

Simple and accurate p-values from likelihood analysis. J. Amer. Statist. Assoc. 94 

1286-1295. MR1731490 

[25] Lugannani, R. and Rice, S. (1980). Saddlepoint approximation for the distribution of the 
sum of independent random variables. Adv. in Appl. Probab. 12 475-490. MR0569438 
[26] McCullagh, P. (1984). Local sufficiency. Biometrika 71 233-244. MR0767151 
[27] McCullagh, P. (1992). Conditional inference and Cauchy models. Biometrika 79 247-259. 
MR1185127 

[28] Reid, N. and Fraser, D.A.S. (2010). Mean likelihood and higher order inference. Biometrika 
97. To appear. 

[29] Severini, T.A. (2001). Likelihood Methods in Statistics. Oxford: Oxford Univ. Press. 
MR1854870 

Received January 2009 and revised December 2009 



